Solved: Re: Spark |
您所在的位置:网站首页 › mkdirs failed to create file › Solved: Re: Spark |
Hi,
I have an issue with Spark, the job failed with this error message :
scala> someDF.write.mode(SaveMode.Append).parquet("file:///data/bbox/tmp")[Stage 0:> (0 + 2) / 2]18/06/05 12:37:39 WARN scheduler.TaskSetManager: Lost task 0.0 in stage 0.0 (TID 0, dec-bb-dl03.bbox-dec.lab.oxv.fr, executor 1): java.io.IOException: Mkdirs failed to create file:/data/bbox/tmp/_temporary/0/_temporary/attempt_201806051237_0000_m_000000_0 (exists=false, cwd=file:/yarn/nm/usercache/hdfs/appcache/application_1527756804026_0065/container_e33_1527756804026_0065_01_000002) at org.apache.hadoop.fs.ChecksumFileSystem.create(ChecksumFileSystem.java:447) at org.apache.hadoop.fs.ChecksumFileSystem.create(ChecksumFileSystem.java:433) at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:926) at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:907) at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:804) at parquet.hadoop.ParquetFileWriter.(ParquetFileWriter.java:225) at parquet.hadoop.ParquetOutputFormat.getRecordWriter(ParquetOutputFormat.java:311) at parquet.hadoop.ParquetOutputFormat.getRecordWriter(ParquetOutputFormat.java:282) at org.apache.spark.sql.execution.datasources.parquet.ParquetOutputWriter.(ParquetRelation.scala:94) at org.apache.spark.sql.execution.datasources.parquet.ParquetRelation$$anon$3.newInstance(ParquetRelation.scala:286) at org.apache.spark.sql.execution.datasources.BaseWriterContainer.newOutputWriter(WriterContainer.scala:129) at org.apache.spark.sql.execution.datasources.DefaultWriterContainer.writeRows(WriterContainer.scala:255) at org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelation$$anonfun$run$1$$anonfun$apply$mcV$sp$3.apply(InsertIntoHadoopFsRelation.scala:148) at org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelation$$anonfun$run$1$$anonfun$apply$mcV$sp$3.apply(InsertIntoHadoopFsRelation.scala:148) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66) at org.apache.spark.scheduler.Task.run(Task.scala:89) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:242) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748)18/06/05 12:37:39 WARN scheduler.TaskSetManager: Lost task 0.1 in stage 0.0 (TID 2, dec-bb-dl03.bbox-dec.lab.oxv.fr, executor 1): java.io.IOException: Mkdirs failed to create file:/data/bbox/tmp/_temporary/0/_temporary/attempt_201806051237_0000_m_000000_1 (exists=false, cwd=file:/yarn/nm/usercache/hdfs/appcache/application_1527756804026_0065/container_e33_1527756804026_0065_01_000002) at org.apache.hadoop.fs.ChecksumFileSystem.create(ChecksumFileSystem.java:447) at org.apache.hadoop.fs.ChecksumFileSystem.create(ChecksumFileSystem.java:433) at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:926) at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:907) at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:804) at parquet.hadoop.ParquetFileWriter.(ParquetFileWriter.java:225) at parquet.hadoop.ParquetOutputFormat.getRecordWriter(ParquetOutputFormat.java:311) at parquet.hadoop.ParquetOutputFormat.getRecordWriter(ParquetOutputFormat.java:282) at org.apache.spark.sql.execution.datasources.parquet.ParquetOutputWriter.(ParquetRelation.scala:94) at org.apache.spark.sql.execution.datasources.parquet.ParquetRelation$$anon$3.newInstance(ParquetRelation.scala:286) at org.apache.spark.sql.execution.datasources.BaseWriterContainer.newOutputWriter(WriterContainer.scala:129) at org.apache.spark.sql.execution.datasources.DefaultWriterContainer.writeRows(WriterContainer.scala:255) at org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelation$$anonfun$run$1$$anonfun$apply$mcV$sp$3.apply(InsertIntoHadoopFsRelation.scala:148) at org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelation$$anonfun$run$1$$anonfun$apply$mcV$sp$3.apply(InsertIntoHadoopFsRelation.scala:148) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66) at org.apache.spark.scheduler.Task.run(Task.scala:89) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:242) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748)
We use CDH 5.14 with the Spark included into the CDH (1.6.0), we think about an version incompatibility issue.
First I tried to change directory rights (777 or give write right to hadoop group), but it didn't work.
Any idea ?
Julien.
|
CopyRight 2018-2019 办公设备维修网 版权所有 豫ICP备15022753号-3 |